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Abstract. Upcoming large redshift surveys potentially allow precision mea- 
surements of the galaxy power spectrum. To accurately measure P{k) on 
the largest scales, comparable to the depth of the survey, it is crucial that 
finite volume effects are accurately corrected for in the data analysis. Here 
we derive analytic expressions for the one such effect that has not previ- 
ously been worked out exactly: that of the so-called integral constraint. We 
also show that for data analysis methods based on counts in cells, multiple 
constraints can be included via simple matrix operations, thereby render- 
ing the results less sensitive to galactic extinction and misestimates of the 
shape of the radial selection function. 



1. Introduction 

Observational data on galaxy clustering are rapidly increasing in both quan- 
tity and quality, which brings new challenges when it comes to data analysis. 
As to quantity, redshifts had been published for a few thousand galaxies 
15 years ago. Today the number is ~ 10^, and ongoing projects such as the 
AAT 2dF Survey and the Sloan Digital Sky Survey (SDSS) will raise it to 
10^ within a few years. Comprehensive reviews of past redshift surveys are 
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given by e.g. Efstathiou (1994), Vogeley (1995), Strauss & Willick (1995) 
and Strauss (1996), the last also including a detailed description of 2dF and 
SDSS. As to quality, more accurate and uniform photometric selection cri- 
teria (enabled by e.g. the well-calibrated 5-band photometry of the SDSS) 
reduce potential systematic errors. 

This increased data quality makes it desirable to avoid approximations 
in the data analysis process and to use methods that can constrain cos- 
mological quantities as accurately as possible, without bias. Here we will 
focus on how to correct for the finite volume of a survey. As is well known, 
this causes the measured power spectrum to be a convolution of the true 
power spectrum with some window function which depends on the survey 
geometry and the data analysis method used. Exact expressions have been 
derived (see e.g. Feldman, Kaiser & Peacock 1994, hereafter "FKP") for 
the window function and its normalization for the case where the number 
density of galaxies is assumed to be known a priori, but the more realistic 
case where the mean galaxy density is determined from the survey itself has 
thus far only been treated approximately (Peacock &: Nicholson 1991; Park 
et al. 1994). The main purpose of this paper is to derive exact expressions 
for this important correction. 

The methods for power spectrum estimation that have been proposed 
in the literature fall into two categories: 

1. Direct Fourier methods 

2. Pixelized methods 

The direct Fourier methods make use of the exact position of each galaxy, 
whereas the other methods start by "pixelizing" the data set (by com- 
puting counts in cells or expansion coefficients for some set of functions), 
thereby reducing the problem to manipulating large vectors and matrices. 
In Section 2, we will derive the finite- volume correction for direct Fourier 
methods. The corresponding correction for pixelized methods is given in 
Section 3. 

2. Finite Volume Correction for Direct Fourier Methods 
2.1. THE POWER SPECTRUM ESTIMATION PROBLEM 

It is customary (see e.g. FKP) to model the observed galaxy distribution 
as a 3D Poisson process n(r) = J]j 5{y — rj) with intensity A(r) = n(r)[l -|- 
5,.(r)]. The function n is the selection function of the galaxy survey under 
consideration, i.e., n{Y)dV is the expected (not the observed) number of 
galaxies in a volume dV about r. The density fluctuations 5r are modeled 
as a homogeneous and isotropic (but not necessarily Gaussian) random field 
with power spectrum P{k), and the power spectrum estimation problem is 
to estimate P{k) given a realization of n(r). 
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2.2. THE DIRECT FOURIER APPROACH 



Due to space limitations, the method summary below is very brief, and the 
interested reader is referred to FKP and Tegmark (1995, hereafter "T95") 
for more detailed introductions to the various methods. 

All direct Fourier methods not involving random numbers^ are specified 
by choosing a weight function ^{r) in real space and a set of weights Wi in 
Fourier space, as defined below. They all involve the following two steps: 
1. At a grid of points kj in Fourier space, fluctuation amplitudes are 
estimated by 



n[r) 
_n(r) 



n r 



(1) 



(Here and throughout, hats denote Fourier transforms.) 
2. The power P at some given fc- value, say k^, is estimated by squar- 
ing these fluctuation amplitudes, subtracting off their shot noise bias, 
rescaling them to correct for the integral constraint, and averaging 
them with some weights Wi that add up to unity: 



Wi 



N{ki 



(2) 



As we will show in Section 2.6, the new and exact expressions for the shot 
noise and integral constraint corrections (when n is normalized so that 
F{0) = 0) are 



a^ik) = 1 + 



V'(k) 



Ar(k) 



1 + 



V'(o) 
to 



^(0) 



Cs(0) 



/(O) 



U(o)* 



2Re ^^^/(k) 




where the functions Cg and / are defined by 

V'(r)2 



J n{r) 
/(k) = y V'(r)2"-*--''3 



(3) 
(4) 

(5) 
(6) 



^Including a random mock survey as in equation (2.1.3) in FKP can never give minimal 
error bars, since inclusion of random numbers will always increase the variance of the 
estimator. 
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If the survey is volume limited, then n is independent of r, Cs{]i) = f{\i.)/n, 
and (72(k)/iV(k) = 1/n. 

2.3. WEIGHTING THE GALAXIES 

Four different choices of the galaxy weighting function i/; have appeared in 
the literature: 



, , , f 1 inside survey volume 

V'(r) = <^ ^ , (7) 
L outside survey volume 

V'(r) = n(r) (8) 

^(r) = ""^^^ (9) 

^ 1 + n(r)P ^ ^ 

ijj{r) = eigenfunction of 



n(r) 



(10) 



The first choice, i.e., weighing all galaxies in a survey volume equally, was 
employed by e.g. Fisher et al. (1993). The second choice was used for e.g. 
the APM survey (Baugh & Efstathiou 1994) — since redshifts were not 
measured, the radial galaxy weighting by default became the selection func- 
tion (moreover, modes could of course only be computed in the directions 
perpendicular to the line of sight). The third choice is that advocated by 
Feldman, Kaiser & Peacock (1994, hereafter FKP), where P denotes an a 
priori guess as to the power in the band under consideration, and minimizes 
the variance in the limit when <^ the depth of the survey. The fourth 
choice corresponds to the method of T95, and gives the narrowest window 
function for a given variance (the constant 7 determines the tradeoff). 



2.4. WEIGHTING THE FOURIER MODES 

As to the weights in Fourier space, Wi, a common choice {e.g. FKP) is to 
perform a straight average of all modes in a spherical shell with its radius 
centered on fc*, although when the survey volume is anisotropic, a weighted 
average giving smaller error bars can generally be obtained by solving a 
quadratic programming problem (T95). 



2.5. WINDOW FUNCTIONS 

The expectation value of a power estimate P that has been corrected for 
the shot noise bias and the integral constraint can always be written as 



(P) = / W{k)P{k)dk, 



(11) 
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where the function W, known as the window function, has the property 
that 

/ W{k)dk = 1. (12) 
Jo 

We can therefore think of P{k^) as measuring a weighted average of the 
true power spectrum, with W specifying the weights (for most methods, 
but not all, these weights arc strictly non-negative as well). The window 
function for a general direct Fourier method is derived in Section 2.6, and 
is found to be 

W(A;) (xY^Wi f mk)fednk, (13) 

i 

where V'j is given by equation (18) and the angular k-integral is carried 
out over a spherical shell of radius k. In the limit where k~^ <C L, where 
L is the smallest survey dimension, the 3D window function simplifies to 
W(A;)oc^.^./|^(k-ki)pA;2dJ^fe. 

2.6. DERIVATION OF THE INTEGRAL CONSTRAINT CORRECTION 

If we knew the selection function n(r) a priori, before counting the galaxies 
in our survey, we would be able to probe the power on the largest scales. 
For modes of wavelength much larger than the survey volume, this would 
essentially correspond to counting the difference between the observed and 
expected number of galaxies in our sample. Of course, we do not know n 
a priori, so our most accurate way of normalizing the selection function 
is by using the galaxies in the survey itself. When n is normalized in this 
way, naive application of equation (1) will give the artifact F(k.) ^ as 
k — > because fluctuations on the scale of the survey are forced to zero by 
definition (Peacock & Nicholson 1991). 

Let us assume that we know the shape of the selection function but not 
its normalization. To reflect this, we write 

n(r) = rino{r), (14) 

where no is our guess as to the shape and 77 is an unknown normalization 
constant. If we had used hq in place of the correct n in equation (1), we 
would in general not obtain the desired result (F(kj)) = but rather 
(F(kj)) = (ry — l)V'(k), which would enter equation (2) as a systematic 
positive power bias. It is the need to eliminate this problem that forces us 
to impose an integral constraint. Let rf denote our estimate of rj. We will 
choose f] so that this bias vanishes, i.e., so that the integral constraint 
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holds, or explicitly, 



V = 



n(r) 
V'(0) J no{r) 



^p{r)(fr 



E 



(16) 



This is an unbiased estimator of the density normalization, since {rj) = r], 
the true value. Substituting n(r) = T?no(r) and equation (16) into equa- 
tion (1), we obtain 

n(r) 



a f 

rj J 



no(r) 
n(r) 
no{r) 

n(r) 

n{r) 



ilj(r)d r 



n(r) 



V'i(r)d^r 



n r 



where the function ijji is defined by 



Mr) = 



-ikj-r 



n(r) 



ij{'ki)fj 
^^(0) J no{r) 



V^(r). 



(17) 



(18) 



Since we will have rj ^ rj with a relative accuracy Arj/rj of order 
where N is the number of galaxies in the survey, we can to a good approx- 
imation treat as a known constant from here on and take 77/77 = 1 on the 
last line of equation (17). Since ^Ji{Qi) = 0, wc now have (F(kj)) = 0, so 
we see that we have succeeded in eliminating the above-mentioned power 
bias. The price for this is slightly more complicated equations. Let us now 
derive the expressions for the shot noise correction and normalization given 
in Equations (3) and (4). 

Since fj ^ ri, we substitute the last expression of equation (17) into 
Equation (3) of T95, treating ft = rjfio as a known function, which gives 

(2-^/lV^.(k)Pmd^fe + /'^'^'^'' 



n{r) 



(19) 



Comparing this with equation (11) and equation (2), we identify the three- 
dimensional window function as 



W{k) OC |V^i(k)|2 

and see that the shot noise correction is 



(20) 
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and expanding the square completes our derivation of equation (3). Per- 
forming an angular integral of equation (20) completes the proof of equa- 
tion (13). The normalization coefficient A'^(kj) of equation (2) is deter- 
mined by the requirement that the window function integrate to unity, i.e., 
iV(ki) = / |'0i(k)p(i^A;/(27r)^. Using Parseval's theorem, we obtain 



iki-r 



V'(ki) 



^(0) 



V'(r)2dV, 



(22) 



and expanding the square as above completes our derivation of equation (4) . 



Figure 1. 

The exact expression for 
N{k) is plotted together 
with the approximation 
of Park et al for a 
Gaussian weight function 
V(r) (X exp[-(r/7?)V2], 
R = 100/i-^Mpc. 
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2.7. HOW IMPORTANT IS THIS CORRECTION? 

Let us evaluate the integral constraint correction factor A'^(k) for a couple 
of simple examples. We first note that for the special case of equation (7), 
we have V'(r)^ cx: V'(r). Hence /(k) oc V'(k), and equation (4) reduces to 



iV(k) = 1 



^(k) 



^(0) 



/(O), 



(23) 



which we recognize as the approximation of Park et al. (1994). For volume- 
limited surveys, the prescriptions given by equations (8), (9) and (10) all 
coincide, so we see that this approximation becomes exact for the volume- 
limited case with these galaxy weighting schemes. For flux-limited surveys, 
on the other hand, these schemes all give a decreasing weight function 
^, since n decreases with distance. For the simple Gaussian case V'(i') = 
exp[-(r/i?)2/2]/7rV^i?V2, equation (4) gives 



A^(k) = 1 + e-^-^'^)' - 2e- 



(24) 
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whereas the approximation (23) gives 



7V(k) = l-e-(-^*^)'. 



(25) 



A Taylor expansion shows that for kR <C 1, the latter overestimates N by 
a factor of two, as illustrated in Figure 1. 

3. Finite Volume Correction for Pixelized Methods 

3.1. PIXELIZED METHODS 

Pixelized data analysis methods start by reducing the galaxy survey prob- 
lem to one similar to that occurring in cosmic microwave background (CMB) 
experiments: estimating a power spectrum given noisy fluctuation measure- 
ments in a number of discrete "pixels". After this, the remaining steps are 
quite analogous to the CMB case, and involve mere linear algebra (oper- 
ations such as matrix inversion, diagonalization, etc.). Let us define the 
over density in N "pixels" xi,...,xn by 



for some set of functions ipi. Although the specific choices of ipi are irrele- 
vant for our present discussion, common choices are to either make these 
functions fairly localized in real space (in which case the pixelization is a 
generalized form of counts in cells) or fairly localized in Fourier space (in 
which case one refers to the functions as "modes" and to Xi as expan- 
sion coefficients) . Let us group the pixels Xi into an iV-dimensional vector x. 
All proposed pixelized methods assume that the mean and the covariance 
matrix of this pixel vector are 



where C depends in some known way on the power spectrum. Once the 
problem has been cast in this form, the power spectrum can be estimated 
using standard machinery, with either a brute force likelihood analysis (as 
in e.g. White & Bunn), a Karhunen-Loeve eigenmode analysis (Karhunen 
1947, Vogeley k Szalay 1996; Tegmark, Taylor & Heavens 1997) or a direct 
quadratic analysis (Hamilton 1997ab; Tegmark 1997). 

3.2. DERIVATION OF THE INTEGRAL CONSTRAINT CORRECTION 

For pixelized methods of power spectrum estimation, the procedure for 
dealing with the integral constraint is quite analogous to that for direct 




(26) 



(x) = 0, 
(xx*) = C, 



(27) 
(28) 
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Fourier methods. However, as we will now show, it is much simpler to 
implement. For counts in cells, for example, one simply removes the mean 
from all rows and columns of the covariancc matrix C before proceeding 
with the analysis. Because of this simplicity, one can, at an almost negligible 
numerical cost, take a more ambitious approach and allow for more than 
one unknown parameter in the selection function. For instance, one can 
impose the constraints that the radial fluctuation average equals zero for 
a few hundred different angular bins, thereby eliminating the sensitivity 
to galactic extinction variations on this scale, as well as requiring that 
the angular fluctuation average vanish for a number of radial bins to be 
insensitive to errors in estimating the precise shape of n. 
Let us parametrize the true selection function n as 

M 

«(r) = EwW' (29) 

where fij are known functions and the "nuissance parameters" rjj, which 
we group into an M-dimcnsional vector rj, are a priori unknown. Let no 
denote some a priori estimate of n. Defining the "uncorrected" pixels as 

we find that 

(x ) = Zt?, (31) 
where the N x M-dimensional matrix Z is defined by 

Zij = f ^V^(r)c^V. (32) 
J no(r) 

This means that in general, (x') 7^ 0, so the uncorrected data set does 
not satisfy equation (27). Instead, its statistical properties depend on the 
unknown nuissance parameters 77. However, we can easily construct a new 
"corrected" data set whose mean is independent of 77. Let us define 

X = nx', (33) 

where 

n = I - ZZ*, (34) 

and Z is a matrix whose rows form an orthonormal basis (Z*Z = I) for the 
space spanned by the rows of Z.^ 11 is a symmetric (11* = 11) projection 

^Such a matrix Z is readily constructed by orthonormalizing the rows of Z with a 
Gram-Schmidt or Cholesky procedure (as in e.g. Tegmark & Bunn 1995). 
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matrix (11^ = 11) projecting onto the subspace orthogonal to the columns 
of Z, i.e., nZ = 0. Our corrected data set x satisfies equation (27), since 
(x) = nZr] = 0. Letting C denote the covariancc matrix of the uncorrected 
data set, the corrected data will have the covariance matrix 

C = (xx*) = nC'n. (35) 

Once X and C have been computed, the rest of the pixelized analysis pro- 
ceeds just as if there had been no integral constraints. The only complica- 
tion is that C is now singular, having rank N — M instead of N. As shown 
in the Appendix of T97, the correct way to deal with this is to replace all 
occurrences of (which is of course undefined) by the "pseudo-inverse" 
of C, defined as 

n[c + 7zz*l~^n (36) 

for some constant 7 7^ 0. The result is independent of 7, but a good choice 
for numerical stability is 7 ~ c/N, where c is the order of magnitude of a 
typical matrix element of C. 

The author wishes to thank Josh Fricman, Andrew Hamilton, Michael 
Strauss and Michael Vogeley for helpful comments on the manuscript. 
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